2023-03-07

Data Set Used

In the slides that follow, New York Air Quality data collected from 1973 will be used to generate simple linear regression models to determine whether Ozone (ppb) appears to be influenced by Solar Radiation (Ly) or Temperature (F). Comparisons will then be made between the models. Finally, recommendations will be made for future analysis.

Preliminary Data Analyis

Temperature Linear Regression Model

  • Error (\(\varepsilon\)) is assumed to be approximately normally distributed

  • Proposed Model: \(\text{Ozone} = \beta_0 + \beta_1 \cdot \text{Temp} + \varepsilon\)

  • Regression Line: \(\text{Ozone} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{Temp}\)

Plot of Temperature Model

Solar Radiation Linear Regression Model

  • Error (\(\varepsilon\)) is assumed to be approximately normally distributed

  • Proposed Model: \(\text{Ozone} = \beta_0 + \beta_1 \cdot \text{Solar.R} + \varepsilon\)

  • Regression Line: \(\text{Ozone} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{Solar.R}\)

Plot of Solar Radiation Model

Temperature ANOVA Output

tempModel = lm(Ozone~Temp, df)
summary(tempModel)
Call:
lm(formula = Ozone ~ Temp, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-40.729 -17.409  -0.587  11.306 118.271 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -146.9955    18.2872  -8.038 9.37e-13 ***
Temp           2.4287     0.2331  10.418  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.71 on 114 degrees of freedom
  (37 observations deleted due to missingness)
Multiple R-squared:  0.4877,    Adjusted R-squared:  0.4832 
F-statistic: 108.5 on 1 and 114 DF,  p-value: < 2.2e-16

Solar Radiation ANOVA Output

radModel = lm(Ozone~Solar.R, df)
summary(radModel)
Call:
lm(formula = Ozone ~ Solar.R, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-48.292 -21.361  -8.864  16.373 119.136 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 18.59873    6.74790   2.756 0.006856 ** 
Solar.R      0.12717    0.03278   3.880 0.000179 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 31.33 on 109 degrees of freedom
  (42 observations deleted due to missingness)
Multiple R-squared:  0.1213,    Adjusted R-squared:  0.1133 
F-statistic: 15.05 on 1 and 109 DF,  p-value: 0.0001793

Analysis and Future Recommendations

  • Temperature
    • R2 value: 0.4832
    • p-value: \(\alpha=\) 2.2e-16
  • Solar Radiation
    • R2 value: 0.1133
    • p-value: \(\alpha=\) 0.0001793
  • Conclusions
    • Both models are statistically significant on Ozone; however, the Temperature model has more statistical significance on Ozone than Solar Radiation (via p-values)
    • The Temperature model is a better model than Solar Radiation because more variance in Ozone is covered by the Temperature model (via R2 values)
  • In the future, it is recommended to use multiple linear regression using Analysis of Variance to determine all statistically significant independent variables (including Wind) that influence Ozone